When there are further limitations as to how the experimental units can be randomly assigned to different treatment groups there are more complicated experimental designs that can be used to properly complete the analysis to reflect this limitation in randomization. We will cover Split-Plot designs and Repeated Measures.
There are situations where the logistical constraints of setting up an experiment require that there be separate random assignments of levels of factors where levels of some factor are assigned to larger experimental units (whole plots) and the levels of other factors are assigned to smaller experimental units (sub plots) within each whole plot. This unequal scale of randomization requires specific code to properly represent this experimental design in the model. While we will be using linear mixed effects models, we will wait for the mixed effects models lesson to go into the details this analysis.
The data for this example is a slightly modified version of the yield (kg/ha) trial from the RCBD example now laid out as a split-plot design (Gomez & Gomez 1984). The trial had 4 genotypes (G), 6 nitrogen levels (N or n_amount) with 3 complete replicates (rep). Within each replicate, there are six main plots (columns), and the six levels of nitrogen treatments are randomly assigned to these main plots. Within each main plot, the four genotypes are randomly assigned to subplots. Here, replicates are the ‘blocks’ in the classical split-plot terminology.
See the “tidy” format
## Rows: 72 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): rep, mainplot, G, N
## dbl (4): yield, row, col, n_amount
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
| yield | row | col | rep | mainplot | G | N | n_amount |
|---|---|---|---|---|---|---|---|
| 4520 | 4 | 1 | rep1 | mp1 | A | N1 | 0 |
| 5598 | 2 | 2 | rep1 | mp2 | A | N2 | 60 |
| 6192 | 1 | 3 | rep1 | mp3 | A | N4 | 120 |
| 8542 | 2 | 4 | rep1 | mp4 | A | N6 | 180 |
| 5806 | 2 | 5 | rep1 | mp5 | A | N3 | 90 |
| 7470 | 1 | 6 | rep1 | mp6 | A | N5 | 150 |
The variables rep, mainplot, N, and G needs to be encoded as a factor instead of the character variable R uses as a default. The mutate_at function changes the selected variable (rep, mainplot, N, or G) to the chosen variable type (as.factor).
We can produce the field layout of the trial using the desplot() function. The data object needs to have columns that identify the row and column of each plot in the trial. The form argument fills the color of each experimental unit by rep in this instance with a header for each rep. The text argument shows the genotype names in each plot. The remaining arguments are formatting including the main title.
desplot(data = dat,
form = rep ~ col + row | rep, # fill color per rep, headers per rep
text = G, cex = 1, shorten = "no", # show genotype names per plot
col = N, # color of genotype names for each N-level
out1 = mainplot, out1.gpar = list(col = "black"), # lines between mainplots
out2 = row,out2.gpar = list(col = "darkgrey"), # lines between rows
main = "Field Layout", show.key = T,key.cex = 0.7) # formattingThe field was first divided into three blocks (replicates). Then, each block was divided into six main plots (whole plot). For every block (replicates) separately, the six fertilizer treatments were randomly allocated to the main plots. Every main plot was then split into four sub plots to accommodate the four varieties. Separately for each main plot, the varieties were randomly allocated to the four sub plots. It is important to recognize that nitrogen (main plot) was randomized according to a randomized complete block design with the replicates as block. Varieties (sub plot) were also randomized according to a randomized complete block design where the main plot is the block. This type of split-plot design is the most common, but there are many other forms of the split-plot design based on how the main plot and sub plot factors are randomized
We then look at the arithmetic means and standard deviations for each genotype and nitrogen levels separately, but also their combinations. You can also arrange the tibble based on decreasing mean values
dat %>%
group_by(G) %>%
dplyr::summarize(mean = mean(yield),
std.dev = sd(yield)) %>%
arrange(desc(mean))| G | mean | std.dev |
|---|---|---|
| A | 6553.556 | 1475.277 |
| B | 6155.500 | 1077.678 |
| C | 5563.444 | 1269.312 |
| D | 3642.111 | 1433.630 |
| N | mean | std.dev |
|---|---|---|
| N3 | 5866.250 | 832.4376 |
| N4 | 5864.417 | 1433.5578 |
| N5 | 5812.000 | 2349.4996 |
| N6 | 5796.750 | 2659.6879 |
| N2 | 5478.167 | 657.3105 |
| N1 | 4054.333 | 671.5853 |
dat %>%
group_by(N, G) %>%
dplyr::summarize(mean = mean(yield),
std.dev = sd(yield)) %>%
arrange(desc(mean)) %>%
print(n = Inf) # show more than default 10 rows## `summarise()` has grouped output by 'N'. You can override using the `.groups`
## argument.
## # A tibble: 24 × 4
## # Groups: N [6]
## N G mean std.dev
## <fct> <fct> <dbl> <dbl>
## 1 N6 A 8701. 270.
## 2 N5 A 7563. 86.9
## 3 N5 B 6951. 808.
## 4 N4 B 6895 166.
## 5 N4 A 6733. 490.
## 6 N5 C 6687. 496.
## 7 N6 B 6540. 936.
## 8 N3 A 6400 523.
## 9 N3 B 6259 499.
## 10 N6 C 6065. 1097.
## 11 N4 C 6014 515.
## 12 N3 C 5994 101.
## 13 N2 B 5982 684.
## 14 N2 A 5672 458.
## 15 N2 C 5443. 589.
## 16 N2 D 4816 506.
## 17 N3 D 4812 963.
## 18 N1 D 4481. 463.
## 19 N1 B 4306 646.
## 20 N1 A 4253. 248.
## 21 N4 D 3816 1311.
## 22 N1 C 3177. 453.
## 23 N5 D 2047. 703.
## 24 N6 D 1881. 407.
A simple plot of the raw data also helps to visualize the distribution of the yield for each genotype nitrogen treatment combination. The point geometry produces a scatter plot. The ylim argument forced the y-axis to start at 0. The theme_classic argument produces a clearer plot format.
ggplot(data = dat,
aes(y = yield, x = N, color = N)) + # different colors for the level of nitrogen
facet_wrap(~ G) + #facet per G level - "labeller = label_both" adds the variable name to each facet graph
geom_point() +
scale_y_continuous(limits = c(0, NA), # make y = axis start at 0
expand = expansion(mult = c(0, 0.1)) # no space below 0
) +
scale_x_discrete(name = NULL) + # x-axis with no name label
theme_bw() + # alternative plot theme with more grid lines - personal preference for theme_classic
theme(legend.position = "bottom") # legend on bottomWe fit a linear model with yield as the response variable. The remaining parts of the model can be broken up into two parts - the design effects and treatment effects. The treatment effects are genotype, nitrogen level, and their interaction as the fixed effects. For the split-plot design there are two randomization units to be represented in the linear model - the main plots and sub plots. Each randomization unit needs to be represented by a random effect so each randomization unit has its own error term.
The goal is to determine if there is a difference in yield between varieties and where those differences are.
The interaction notation N * G is short for N + G + N:G
(1|rep) represents the block level variance - random
effect (1|rep:mainplot) represents the whole-plot (main
plot) error - random effect The residual represents the sub-plot
variability not captured by mainplot variance
## boundary (singular) fit: see help('isSingular')
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: yield ~ N * G + (1 | rep) + (1 | rep:mainplot)
## Data: dat
##
## REML criterion at convergence: 780.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.8506 -0.5356 -0.1311 0.4778 1.9272
##
## Random effects:
## Groups Name Variance Std.Dev.
## rep:mainplot (Intercept) 50855.53547 225.51172
## rep (Intercept) 0.00307 0.05541
## Residual 349441.51914 591.13579
## Number of obs: 72, groups: rep:mainplot, 18; rep, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 4252.67 365.28 45.78 11.642 0.00000000000000279 ***
## NN2 1419.33 516.59 45.78 2.748 0.00856 **
## NN3 2147.33 516.59 45.78 4.157 0.00014 ***
## NN4 2480.00 516.59 45.78 4.801 0.00001725039156940 ***
## NN5 3310.67 516.59 45.78 6.409 0.00000007184143597 ***
## NN6 4448.00 516.59 45.78 8.610 0.00000000003937369 ***
## GB 53.33 482.66 36.00 0.110 0.91263
## GC -1075.33 482.66 36.00 -2.228 0.03222 *
## GD 228.67 482.66 36.00 0.474 0.63853
## NN2:GB 256.67 682.58 36.00 0.376 0.70911
## NN3:GB -194.33 682.58 36.00 -0.285 0.77750
## NN4:GB 109.00 682.58 36.00 0.160 0.87402
## NN5:GB -666.00 682.58 36.00 -0.976 0.33572
## NN6:GB -2213.67 682.58 36.00 -3.243 0.00255 **
## NN2:GC 846.00 682.58 36.00 1.239 0.22321
## NN3:GC 669.33 682.58 36.00 0.981 0.33334
## NN4:GC 356.67 682.58 36.00 0.523 0.60451
## NN5:GC 199.33 682.58 36.00 0.292 0.77194
## NN6:GC -1560.00 682.58 36.00 -2.285 0.02828 *
## NN2:GD -1084.67 682.58 36.00 -1.589 0.12079
## NN3:GD -1816.67 682.58 36.00 -2.661 0.01155 *
## NN4:GD -3145.33 682.58 36.00 -4.608 0.00004945084378565 ***
## NN5:GD -5745.33 682.58 36.00 -8.417 0.00000000050133641 ***
## NN6:GD -7048.67 682.58 36.00 -10.326 0.00000000000261524 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 24 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
| F | Df | Df.res | Pr(>F) | |
|---|---|---|---|---|
| (Intercept) | 135.538147 | 1 | 45.78314 | 0.0000000 |
| N | 17.621854 | 5 | 41.96547 | 0.0000000 |
| G | 3.016614 | 3 | 36.00000 | 0.0424007 |
| N:G | 13.235986 | 15 | 36.00000 | 0.0000000 |
There are three types of sums of squares (1, 2, or 3). This is an important consideration when the data is unbalanced (missing data). To demonstrate the differences we will consider a model that includes two categorical independent variables (Factor A and Factor B). The model will test the two main effects and the interaction between the two factors. Type 1 sums of squares is also called “sequential” sums of square that tests for Factor A, then Factor B, then the interaction of Factor A and B. When the data is unbalanced the sums of squares will give a different result depending on which main effect is considered first. It is rare that you want to test the main effects sequentially, so normally you should use Type 2 or 3 sums of squares.
Type 2 sums of squares is used only when you are not testing an interaction, and tests each main effect after the other main effect. Type 3 sums of squares tests each main effect after accounting for all other effects, including interactions;it is commonly used when you are testing an interaction. If you test the interaction and it is not significant, then you can test the main effects alone and use the type 2 sums of squares - it is more powerful than type 3.
This link can provide more details for your reference.
Multiple ways to assess model assumptions - the autoplot function used before doesn’t work with lmer models.
##
## Shapiro-Wilk normality test
##
## data: epsilon
## W = 0.96729, p-value = 0.05772
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 23 | 0.4656747 | 0.9752327 |
| 48 | NA | NA |
Results indicate the satisfaction of linear model assumptions, so we can move onto multiple comparisons tests.
Following a significant F-test, we compare the variety means. We will show all pairwise comparisons using the colon between the two factors.
all_mean_comparisons_tukey_re <- mod_re %>%
emmeans(specs = ~ N:G) %>%
cld(Letters = letters)
all_mean_comparisons_tukey_re| N | G | emmean | SE | df | lower.CL | upper.CL | .group | |
|---|---|---|---|---|---|---|---|---|
| 24 | N6 | D | 1880.667 | 365.2839 | 45.78314 | 1145.294 | 2616.039 | a |
| 23 | N5 | D | 2046.667 | 365.2839 | 45.78314 | 1311.294 | 2782.039 | a |
| 13 | N1 | C | 3177.333 | 365.2839 | 45.78314 | 2441.961 | 3912.706 | ab |
| 22 | N4 | D | 3816.000 | 365.2839 | 45.78314 | 3080.628 | 4551.372 | abc |
| 1 | N1 | A | 4252.667 | 365.2839 | 45.78314 | 3517.294 | 4988.039 | bcd |
| 7 | N1 | B | 4306.000 | 365.2839 | 45.78314 | 3570.628 | 5041.372 | bcd |
| 19 | N1 | D | 4481.333 | 365.2839 | 45.78314 | 3745.961 | 5216.706 | bcde |
| 21 | N3 | D | 4812.000 | 365.2839 | 45.78314 | 4076.628 | 5547.372 | bcdef |
| 20 | N2 | D | 4816.000 | 365.2839 | 45.78314 | 4080.628 | 5551.372 | bcdef |
| 14 | N2 | C | 5442.667 | 365.2839 | 45.78314 | 4707.294 | 6178.039 | cdefg |
| 2 | N2 | A | 5672.000 | 365.2839 | 45.78314 | 4936.628 | 6407.372 | cdefgh |
| 8 | N2 | B | 5982.000 | 365.2839 | 45.78314 | 5246.628 | 6717.372 | defgh |
| 15 | N3 | C | 5994.000 | 365.2839 | 45.78314 | 5258.628 | 6729.372 | defgh |
| 16 | N4 | C | 6014.000 | 365.2839 | 45.78314 | 5278.628 | 6749.372 | defgh |
| 18 | N6 | C | 6065.333 | 365.2839 | 45.78314 | 5329.961 | 6800.706 | defgh |
| 9 | N3 | B | 6259.000 | 365.2839 | 45.78314 | 5523.628 | 6994.372 | defgh |
| 3 | N3 | A | 6400.000 | 365.2839 | 45.78314 | 5664.628 | 7135.372 | efgh |
| 12 | N6 | B | 6540.333 | 365.2839 | 45.78314 | 5804.961 | 7275.706 | fgh |
| 17 | N5 | C | 6687.333 | 365.2839 | 45.78314 | 5951.961 | 7422.706 | fgh |
| 4 | N4 | A | 6732.667 | 365.2839 | 45.78314 | 5997.294 | 7468.039 | fghi |
| 10 | N4 | B | 6895.000 | 365.2839 | 45.78314 | 6159.628 | 7630.372 | ghi |
| 11 | N5 | B | 6950.667 | 365.2839 | 45.78314 | 6215.294 | 7686.039 | ghi |
| 5 | N5 | A | 7563.333 | 365.2839 | 45.78314 | 6827.961 | 8298.706 | hi |
| 6 | N6 | A | 8700.667 | 365.2839 | 45.78314 | 7965.294 | 9436.039 | i |
Create a plot that displays both the raw data and results from the multiple comparisons tests of the adjusted means from the linear model. First create a tibble with an additional column that combined the N and G variables, and then sorts the rows on increasing values of emmean. Then create a similar table for the raw data.
all_mean_comparisons_tukey_re <- all_mean_comparisons_tukey_re %>%
as_tibble() %>%
mutate(N_G = paste0(N,"-",G)) %>%
mutate(N_G = fct_reorder(N_G, emmean))
dat <- dat %>%
mutate(N_G = paste0(N,"-",G)) %>%
mutate(N_G = fct_relevel(N_G, levels(all_mean_comparisons_tukey_re$N_G)))
RCBD_Plot_tukey <- ggplot() +
geom_point(data = dat, aes(y = yield, x = N_G, color = N), position = position_nudge(x = -0.4)) + # black dots representing the raw data
geom_boxplot(data =dat, aes(y = yield, x = N_G), width = 0.25, outlier.shape = NA, position = position_nudge(x = -0.2)) + # black boxplot
geom_point(data = all_mean_comparisons_tukey_re, aes(y = emmean, x = N_G), size = 2, color = "red") + # red dots representing the adjusted means
geom_errorbar(data = all_mean_comparisons_tukey_re, aes(ymin = lower.CL, ymax = upper.CL, x = N_G), color = "red", width = 0.1) + # red error bars representing the confidence limits of the adjusted means
geom_text(data = all_mean_comparisons_tukey_re, aes(y = 10000, x = N_G, label = str_trim(.group)), color = "red", position = position_nudge(x = -0.2), hjust = 0, angle = 90) + # red letters
scale_y_continuous(name = "Yield", limits = c(0, NA), expand = expansion(mult = c(0, 0.1))) +
scale_x_discrete(name = "Nitrogen-Genotype combination") +
theme_classic() + # clearer plot format
labs(caption=str_wrap("Black dots represent raw data. Red dots and error bars represent estimated marginal means ± 95% confidence interval per group. Means not sharing any letter are significantly different by the t-test at the 5% level of significance.", width = 120)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), legend.position = "bottom")
RCBD_Plot_tukeyWhen repeated observations are made on the same individual experimental unit this violates one of the linear model assumptions - no autocorrelation among errors. This analysis requires some specific steps to account for the temporal autocorrelation. We will first fit a model that assumes no autocorrelation. Then we will fit multiple models that assume different types of temporal autocorrelation and compare them to see which fits the data best.
Data from a sorghum trial laid out as a randomized complete block design with five blocks and four sorghum varieties as the only treatment factor. The response variable - leaf area index - was measured in five consecutive weeks on each plot starting two weeks after emergence. The repeated observation variable (week) needs to be treated as a factor - all measurements were made at discrete time points. This dataset is comprised of 100 values resembling longitudinal data - repeated measures or time series analysis.
The week factor is not a treatment factor that can be randomized. The repeated measurements of the same experimental unit are likely to be serially correlated, which needs to be accounted for to produce a non-biased estimate of the differences between varieties.
We also include the steps to format the necessary variables to be factors.
# data - reformatting agriTutorial::sorghum
dat <- agriTutorial::sorghum %>% # data from agriTutorial package
rename(block = Replicate,
weekF = factweek, # week as factor
weekN = varweek, # week as numeric/integer
plot = factplot) %>%
mutate(variety = paste0("var", variety), # variety id
block = paste0("block", block), # block id
weekF = paste0("week", weekF), # week id
plot = paste0("plot", plot), # plot id
unit = paste0("obs", 1:n() )) %>% # observation id - needed for glmmTMB model
mutate_at(vars(variety:plot, unit), as.factor)
head(dat)| y | variety | block | weekF | plot | weekN | varblock | unit |
|---|---|---|---|---|---|---|---|
| 5.00 | var1 | block1 | week1 | plot1 | 1 | 1 | obs1 |
| 4.84 | var1 | block1 | week2 | plot1 | 2 | 1 | obs2 |
| 4.02 | var1 | block1 | week3 | plot1 | 3 | 1 | obs3 |
| 3.75 | var1 | block1 | week4 | plot1 | 4 | 1 | obs4 |
| 3.13 | var1 | block1 | week5 | plot1 | 5 | 1 | obs5 |
| 4.42 | var1 | block2 | week1 | plot2 | 1 | 2 | obs6 |
The lack of row and column variables prevent us from using the desplot function to create a figure that shows the experimental design. Descriptive tables will show the arithmetic means and standard deviations for leaf area index per variety. We can use the pivot_wider function to get mean values for each week.
dat %>%
group_by(variety) %>%
summarize(mean = mean(y, na.rm = TRUE), std.dev = sd(y, na.rm = TRUE)) %>%
arrange(desc(mean)) # sort| variety | mean | std.dev |
|---|---|---|
| var3 | 4.6340 | 0.7568741 |
| var4 | 4.6052 | 0.8024033 |
| var2 | 4.5900 | 0.7634352 |
| var1 | 3.4960 | 0.7601425 |
dat %>%
group_by(weekF, variety) %>%
summarize(mean = mean(y, na.rm = TRUE)) %>%
pivot_wider(names_from = weekF, values_from = mean) # pivot wider creates the table with each week as a column## `summarise()` has grouped output by 'weekF'. You can override using the
## `.groups` argument.
| variety | week1 | week2 | week3 | week4 | week5 |
|---|---|---|---|---|---|
| var1 | 4.242 | 4.046 | 3.406 | 3.086 | 2.700 |
| var2 | 5.148 | 4.980 | 4.594 | 4.220 | 4.008 |
| var3 | 4.888 | 5.074 | 4.714 | 4.476 | 4.018 |
| var4 | 5.154 | 4.944 | 4.706 | 4.388 | 3.834 |
We can create an animated plot that shows the change in leaf area index by variety across the five weeks.
var_colors <- c("#8cb369", "#f4a259", "#5b8e7d", "#bc4b51")
names(var_colors) <- dat$variety %>% levels()
gganimate_plot <- ggplot(
data = dat, aes(y = y, x = weekF,
group = variety,
color = variety)) +
geom_boxplot(outlier.shape = NA) +
geom_point(alpha = 0.5, size = 3) +
scale_y_continuous(
name = "Leaf area index",
limits = c(0, 6.5),
expand = c(0, 0),
breaks = c(0:6)
) +
scale_color_manual(values = var_colors) +
theme_bw() +
theme(legend.position = "bottom",
axis.title.x = element_blank()) +
transition_time(weekN) +
shadow_mark(exclude_layer = 2)
animate(gganimate_plot, renderer = gifski_renderer()) # render gifAn inefficient approach is to model a subset of the data from one week using either a fixed effect for block or random effect.
dat.wk1 <- dat %>% filter(weekF == "week1") # subset data from first week only
mod.wk1 <- lm(y ~ variety + block, data = dat.wk1)
anova(mod.wk1)| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| variety | 3 | 2.76036 | 0.9201200 | 46.59032 | 0.0000007 |
| block | 4 | 8.31897 | 2.0797425 | 105.30786 | 0.0000000 |
| Residuals | 12 | 0.23699 | 0.0197492 | NA | NA |
| Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | |
|---|---|---|---|---|---|---|
| variety | 2.76036 | 0.92012 | 3 | 12 | 46.59032 | 0.0000007 |
dat.wk2 <- dat %>% filter(weekF == "week2")
mod.wk2 <- lm(y ~ variety + block, data = dat.wk2)
anova(mod.wk2)| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| variety | 3 | 3.45322 | 1.151073 | 55.97245 | 0.0000003 |
| block | 4 | 8.60998 | 2.152495 | 104.66788 | 0.0000000 |
| Residuals | 12 | 0.24678 | 0.020565 | NA | NA |
| Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | |
|---|---|---|---|---|---|---|
| variety | 3.45322 | 1.151073 | 3 | 12 | 55.97245 | 0.0000003 |
We can also fit a model with all weeks combined.
It is reasonable to assume that the treatment effects evolve over time and thus are week-specific. Furthermore, there could be variability among blocks over the weeks. The glmmTMB function allows us more flexibility than the lmer function.
mod.iid <- glmmTMB(y ~ weekF * variety + (1|block) + (1|unit), # add random unit term to mimic error variance
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default is ML
data = dat)
mod.iid %>% broom.mixed::tidy(effects = "ran_pars", scales = "vcov")| effect | component | group | term | estimate |
|---|---|---|---|---|
| ran_pars | cond | block | var__(Intercept) | 0.4144806 |
| ran_pars | cond | unit | var__(Intercept) | 0.0271694 |
These results assume independent and homogeneous error term, which MAY be wrong. We will now test for possible covariance structures for the error term that improve overall model fit. This should be done before assessing the significance of any effect - the model assumptions should be met before assessing the significant differences among the varieties.
The most popular correlation structure is called first order autoregressive AR(1). This error structure works best when all time points are equally spaced. Other variance structures are possible and should also be considered.
mod.AR1 <- glmmTMB(formula = y ~ weekF * variety + (1|block) +
ar1(weekF + 0 | plot), # ar1 structure as random term to mimic error var, and + 0 is used to remove intercept term from random term covariance specification, which is required to properly define the covariance matrix for repeated measures
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat)
# Extract variance component estimates
mod.AR1 %>% broom.mixed::tidy(effects = "ran_pars", scales = "vcov")| effect | component | group | term | estimate |
|---|---|---|---|---|
| ran_pars | cond | block | var__(Intercept) | 0.3916641 |
| ran_pars | cond | plot | var__weekFweek1 | 0.0325258 |
| ran_pars | cond | plot | var__weekFweek2 | 0.0325258 |
| ran_pars | cond | plot | var__weekFweek3 | 0.0325258 |
| ran_pars | cond | plot | var__weekFweek4 | 0.0325258 |
| ran_pars | cond | plot | var__weekFweek5 | 0.0325258 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek2 | 0.0243886 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek3 | 0.0182872 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek4 | 0.0137122 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek5 | 0.0102817 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek3 | 0.0243886 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek4 | 0.0182872 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek5 | 0.0137122 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek4 | 0.0243886 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek5 | 0.0182872 |
| ran_pars | cond | plot | cov__weekFweek4.weekFweek5 | 0.0243886 |
Compound symmetry can be modeled as standard - homogeneous (nlme only - not showing here) or heterogeneous (glmmTMB) to model the error term. Assumes the correlation is constant regardless of how far apart the measurements are. The heterogeneous compound symmetry is a simple extension of the homogeneous compound symmetry structure with more parameters to allow for the variances along the diagonal of the matrix to be different.
mod.hCS <- glmmTMB(formula = y ~ weekF * variety + (1|block) +
cs(weekF + 0 | plot), # hcs structure as random term to mimic error var
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat)
# show variance components
mod.hCS %>% broom.mixed::tidy(effects = "ran_pars", scales = "vcov")| effect | component | group | term | estimate |
|---|---|---|---|---|
| ran_pars | cond | block | var__(Intercept) | 0.8033429 |
| ran_pars | cond | plot | var__weekFweek1 | 0.0552712 |
| ran_pars | cond | plot | var__weekFweek2 | 0.0584155 |
| ran_pars | cond | plot | var__weekFweek3 | 0.1155951 |
| ran_pars | cond | plot | var__weekFweek4 | 0.1247201 |
| ran_pars | cond | plot | var__weekFweek5 | 0.1990451 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek2 | 0.0522395 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek3 | 0.0734860 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek4 | 0.0763314 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek5 | 0.0964297 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek3 | 0.0755474 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek4 | 0.0784726 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek5 | 0.0991347 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek4 | 0.1103885 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek5 | 0.1394541 |
| ran_pars | cond | plot | cov__weekFweek4.weekFweek5 | 0.1448538 |
Toeplitz (glmmTMB only) can be calculated faster than similar AR(1) approach. Toeplitz is unique from AR(1) in that the correlations do not necessarily have the same pattern as is the case in AR(1). This correlation structure is only possible using the glmmTMB function.
mod.Toep <- glmmTMB(formula = y ~ weekF * variety + (1|block) +
toep(weekF + 0 | plot), # toep structure as random term to mimic err var
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat) ## Warning in (function (start, objective, gradient = NULL, hessian = NULL, :
## NA/NaN function evaluation
| effect | component | group | term | estimate |
|---|---|---|---|---|
| ran_pars | cond | block | var__(Intercept) | 0.3436258 |
| ran_pars | cond | plot | var__weekFweek1 | 0.0393570 |
| ran_pars | cond | plot | var__weekFweek2 | 0.0392134 |
| ran_pars | cond | plot | var__weekFweek3 | 0.0236161 |
| ran_pars | cond | plot | var__weekFweek4 | 0.0342313 |
| ran_pars | cond | plot | var__weekFweek5 | 0.0301218 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek2 | 0.0300907 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek3 | 0.0175201 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek4 | 0.0099941 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek5 | -0.0005475 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek3 | 0.0233091 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek4 | 0.0210548 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek5 | 0.0093579 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek4 | 0.0217781 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek5 | 0.0153273 |
| ran_pars | cond | plot | cov__weekFweek4.weekFweek5 | 0.0245955 |
The unstructured variance structure is the most “liberal” that allows every term to be different. In doing so it requires fitting the most parameters.
mod.UN <- glmmTMB(formula = y ~ weekF * variety + (1|block) +
us(weekF + 0 | plot), # us structure as random term to mimic error var
dispformula = ~ 0, # fix original error variance to 0
REML = TRUE, # needs to be stated since default = ML
data = dat)
# show variance components
mod.UN %>% broom.mixed::tidy(effects = "ran_pars", scales = "vcov")| effect | component | group | term | estimate |
|---|---|---|---|---|
| ran_pars | cond | block | var__(Intercept) | 0.4181621 |
| ran_pars | cond | plot | var__weekFweek1 | 0.0243949 |
| ran_pars | cond | plot | var__weekFweek2 | 0.0264585 |
| ran_pars | cond | plot | var__weekFweek3 | 0.0223323 |
| ran_pars | cond | plot | var__weekFweek4 | 0.0312693 |
| ran_pars | cond | plot | var__weekFweek5 | 0.0408654 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek2 | 0.0199855 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek3 | 0.0087999 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek4 | 0.0126196 |
| ran_pars | cond | plot | cov__weekFweek1.weekFweek5 | -0.0028035 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek3 | 0.0109330 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek4 | 0.0153952 |
| ran_pars | cond | plot | cov__weekFweek2.weekFweek5 | -0.0009543 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek4 | 0.0248921 |
| ran_pars | cond | plot | cov__weekFweek3.weekFweek5 | 0.0197851 |
| ran_pars | cond | plot | cov__weekFweek4.weekFweek5 | 0.0246111 |
In order to select the best model here, we can simply compare their AIC values, since all models are identical regarding their fixed effects. The smaller the value of AIC, the better the fit.
AICcmodavg::aictab(
cand.set = list(mod.iid, mod.hCS, mod.AR1, mod.Toep, mod.UN),
modnames = c("iid", "hCS", "AR1", "Toeplitz", "UN"),
second.ord = FALSE, # get AIC instead of AICc
sort = TRUE
)| Modnames | K | AIC | Delta_AIC | ModelLik | AICWt | LL | Cum.Wt | |
|---|---|---|---|---|---|---|---|---|
| 5 | UN | 36 | -9.7867784 | 0.000000 | 1.0000000 | 0.9335234 | 40.893389 | 0.9335234 |
| 3 | AR1 | 23 | -4.3473941 | 5.439384 | 0.0658950 | 0.0615146 | 25.173697 | 0.9950379 |
| 4 | Toeplitz | 30 | 0.6958276 | 10.482606 | 0.0052934 | 0.0049415 | 29.652086 | 0.9999794 |
| 2 | hCS | 27 | 11.6540463 | 21.440825 | 0.0000221 | 0.0000206 | 21.172977 | 1.0000000 |
| 1 | iid | 22 | 37.6615506 | 47.448329 | 0.0000000 | 0.0000000 | 3.169225 | 1.0000000 |
According to AIC values the UN correlation structure has the lowest AIC value as no others are within 2. It is common to use Delta AIC < 2 to select a model. We will now review the UN model in the following steps.
The Anova function produces the table of F-Statistics to determine if there significant differences among the different levels of the varieties tested.
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| (Intercept) | 203.3025 | 1 | 0 |
| weekF | 299.8838 | 4 | 0 |
| variety | 113.1530 | 3 | 0 |
| weekF:variety | 111.1423 | 12 | 0 |
Before we look at any multiple comparisons we need to confirm that the model assumptions have been met.
##
## Shapiro-Wilk normality test
##
## data: epsilon
## W = 0.9935, p-value = 0.9162
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 19 | 1.595601 | 0.0776386 |
| 80 | NA | NA |
Results indicate the satisfaction of linear model assumptions, so we can move onto multiple comparisons tests.
all_mean_comparisons_tukey_UN <- mod.UN %>% # all possible comparisons aren't needed based on primary objectives for comparing varieties within week
emmeans(specs = ~ variety:weekF) %>%
cld(Letters = letters)
withinWeek_mean_comparisons_tukey_UN <- mod.UN %>%
emmeans(specs = ~ variety|weekF) %>%
cld(Letters = letters)Create a plot that displays both the raw data and results from the multiple comparisons tests of the adjusted means from the linear model. First create a tibble with an additional column that combined the week and variety variables, and then sorts the rows on increasing values of emmean. Then create a similar table for the raw data.
withinWeek_UN_Plot_tukey <- ggplot() +
facet_wrap(~ weekF, labeller = label_both, nrow = 1) + # facet per G level
geom_point(data = dat, aes(y = y, x = variety)) + # dots representing the raw data
geom_point(data = withinWeek_mean_comparisons_tukey_UN, aes(y = emmean, x = variety), color = "red", position = position_nudge(x = 0.1)) + # red dots representing the adjusted means
geom_errorbar(data = withinWeek_mean_comparisons_tukey_UN, aes(ymin = lower.CL, ymax = upper.CL, x = variety), color = "red", width = 0.1, position = position_nudge(x = 0.1)) + # red error bars representing the confidence limits of the adjusted means
geom_text(data = withinWeek_mean_comparisons_tukey_UN, aes(y = emmean, x = variety, label = str_trim(.group)), color = "red", position=position_nudge(x = 0.2), hjust = 0) + # red letters
scale_y_continuous(name = "Yield", limits = c(0, NA), expand = expansion(mult = c(0, 0.1))) +
scale_x_discrete(name = NULL) +
theme_classic() + # clearer plot format
labs(caption = str_wrap("Black dots represent raw data. Red dots and error bars represent estimated marginal means ± 95% confidence interval per group. Means not sharing any letter are significantly different by the t-test at the 5% level of significance.", width = 120)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), legend.position = "bottom")
withinWeek_UN_Plot_tukeyWe will use the data from an experiment conducted in a greenhouse in Silver Bay, Minnesota. Plants exposed to different (three) treatments were observed over a 40 day period. Make sure to modify the variable types to factor for the repeated measures analysis. Fit a linear model assuming no random effects, a linear mixed effects model with a random intercept for each plant, and a linear mixed effects model that tests unstructured variance, heterogeneous compound symmetry, autoregressive order-1, homogeneous diagonal variance, and heterogeneous diagonal variance structures.
library(agridat)
data(pederson.lettuce.repeated)
dat <- pederson.lettuce.repeated
library(lattice)
dat <- dat[order(dat$day),]
xyplot(weight ~ day|trt, dat, type = 'l', group = plant, layout = c(3, 1),
main = "pederson.lettuce.repeated")You really want to see the answers without trying yourself???
## 'data.frame': 594 obs. of 4 variables:
## $ plant : int 1 2 3 4 5 6 7 8 9 10 ...
## $ day : int 1 1 1 1 1 1 1 1 1 1 ...
## $ trt : Factor w/ 3 levels "T1","T2","T3": 1 1 1 1 1 1 1 1 1 1 ...
## $ weight: num 44 48.4 55.5 48.6 48.1 42.6 55.4 47.5 48.8 51.2 ...
dat$plant <- as.factor(dat$plant)
dat$day <- as.factor(dat$day)
# Fitting linear model with no random effect
mod_lm <- lm(weight ~ trt * day, data = dat)
# Fitting linear mixed model with random effect
mod <- glmmTMB(weight ~ trt * day + (1|plant), data = dat)
# Code to try when issues of convergence pop up
# control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS"))
# control = glmmTMBControl(maxfun = 20000)
mod.UN <- glmmTMB(weight ~ trt * day + us(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting'),
## help('diagnose')
mod.UN <- glmmTMB(weight ~ trt * day + us(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS")))## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; . See vignette('troubleshooting'), help('diagnose')
# Unstructured Correlation - fit estimate for each pair of observations over time
# Overparameterized model warning message - makes sense this unstructured covariance option fits a covariance estimate for every pair of observations
mod.CS <- glmmTMB(weight ~ trt * day + cs(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting'),
## help('diagnose')
mod.CS <- glmmTMB(weight ~ trt * day + cs(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS")))
mod.HomD <- glmmTMB(weight ~ trt * day + homdiag(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)
mod.HetD <- glmmTMB(weight ~ trt * day + diag(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat) ## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting'),
## help('diagnose')
# Convergence warning - try different optimizer
# Very common to get convergence warnings for small datasets of some covariance structures
mod.HetD <- glmmTMB(weight ~ trt * day + diag(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat,
control = glmmTMBControl(optimizer = optim, optArgs = list(method = "BFGS")))
mod.AR1 <- glmmTMB(weight ~ trt * day + ar1(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)
AICcmodavg::aictab(
cand.set = list(mod, mod.CS, mod.HomD, mod.HetD, mod.AR1),
modnames = c("iid", "hCS", "HomD", "HetD", "Ar1"),
second.ord = FALSE) # get AIC instead of AICc| Modnames | K | AIC | Delta_AIC | ModelLik | AICWt | LL | Cum.Wt | |
|---|---|---|---|---|---|---|---|---|
| 5 | Ar1 | 35 | 2257.262 | 0.0000 | 1 | 1 | -1093.631 | 1 |
| 2 | hCS | 45 | 2718.281 | 461.0183 | 0 | 0 | -1314.140 | 1 |
| 1 | iid | 35 | 2999.319 | 742.0571 | 0 | 0 | -1464.660 | 1 |
| 4 | HetD | 44 | 3409.611 | 1152.3487 | 0 | 0 | -1660.806 | 1 |
| 3 | HomD | 34 | 3498.414 | 1241.1513 | 0 | 0 | -1715.207 | 1 |
# AR1 model is best fitting model
# Now to check assumptions and make any transformations as needed
mod.AR1 <- glmmTMB(weight ~ trt * day + ar1(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)
plot(residuals(mod.AR1) ~ fitted(mod.AR1)) # less obvious
abline(h = 0)##
## Shapiro-Wilk normality test
##
## data: residuals(mod.AR1)
## W = 0.93534, p-value = 0.000000000000002314
# Try polynomial term for day - orthogonal polynomial (not raw polynomial coefficients)
mod.AR1_poly <- glmmTMB(weight ~ trt * poly(day, 2) + ar1(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)
qqnorm(residuals(mod.AR1_poly))
qqline(residuals(mod.AR1_poly))##
## Shapiro-Wilk normality test
##
## data: residuals(mod.AR1_poly)
## W = 0.97337, p-value = 0.000000006418
# Other resources to assess assumptions for more complicated models
sims = simulateResiduals(mod.AR1)
plot(sims, quantreg = T)# KS Test: expected to follow a uniform distribution - not necessarily normal distribution values close to 1.0 indicate a good fit while values close to 0.0 indicate a poor fit.
mod.AR1 <- glmmTMB(log(weight) ~ trt * day + ar1(day + 0|plant),
dispformula = ~ 0, REML = TRUE, data = dat)
sims = simulateResiduals(mod.AR1)
plot(sims, quantreg = T)##
## Shapiro-Wilk normality test
##
## data: residuals(mod.AR1)
## W = 0.92093, p-value < 0.00000000000000022
| Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|
| (Intercept) | 56476.784784 | 1 | 0.0000000 |
| trt | 1.069904 | 2 | 0.5856973 |
| day | 2123.165480 | 10 | 0.0000000 |
| trt:day | 225.952277 | 20 | 0.0000000 |
within_means_tukey <- mod.AR1 %>%
emmeans(~ trt|day) %>%
cld(Letters = letters) %>%
as_tibble() %>%
rename(weight = emmean)
within_means_tukey| trt | day | weight | SE | df | lower.CL | upper.CL | .group |
|---|---|---|---|---|---|---|---|
| T2 | 1 | 3.886148 | 0.0163905 | 559 | 3.853953 | 3.918342 | a |
| T1 | 1 | 3.895169 | 0.0163905 | 559 | 3.862974 | 3.927363 | a |
| T3 | 1 | 3.909896 | 0.0163905 | 559 | 3.877702 | 3.942091 | a |
| T1 | 5 | 3.909989 | 0.0163905 | 559 | 3.877794 | 3.942183 | a |
| T2 | 5 | 3.915754 | 0.0163905 | 559 | 3.883559 | 3.947948 | a |
| T3 | 5 | 3.954929 | 0.0163905 | 559 | 3.922734 | 3.987123 | a |
| T1 | 8 | 3.945891 | 0.0163905 | 559 | 3.913696 | 3.978085 | a |
| T2 | 8 | 3.947583 | 0.0163905 | 559 | 3.915388 | 3.979777 | a |
| T3 | 8 | 3.974883 | 0.0163905 | 559 | 3.942689 | 4.007078 | a |
| T2 | 12 | 3.973312 | 0.0163905 | 559 | 3.941118 | 4.005507 | a |
| T1 | 12 | 3.980963 | 0.0163905 | 559 | 3.948768 | 4.013157 | a |
| T3 | 12 | 4.004698 | 0.0163905 | 559 | 3.972503 | 4.036892 | a |
| T2 | 15 | 3.994317 | 0.0163905 | 559 | 3.962123 | 4.026512 | a |
| T1 | 15 | 4.003509 | 0.0163905 | 559 | 3.971315 | 4.035704 | a |
| T3 | 15 | 4.030246 | 0.0163905 | 559 | 3.998051 | 4.062440 | a |
| T2 | 18 | 4.015537 | 0.0163905 | 559 | 3.983343 | 4.047731 | a |
| T1 | 18 | 4.028281 | 0.0163905 | 559 | 3.996086 | 4.060475 | a |
| T3 | 18 | 4.059404 | 0.0163905 | 559 | 4.027209 | 4.091598 | a |
| T2 | 21 | 4.079512 | 0.0163905 | 559 | 4.047317 | 4.111706 | a |
| T3 | 21 | 4.097663 | 0.0163905 | 559 | 4.065469 | 4.129858 | a |
| T1 | 21 | 4.097701 | 0.0163905 | 559 | 4.065506 | 4.129895 | a |
| T3 | 26 | 4.191027 | 0.0163905 | 559 | 4.158833 | 4.223221 | a |
| T2 | 26 | 4.197972 | 0.0163905 | 559 | 4.165777 | 4.230166 | a |
| T1 | 26 | 4.230361 | 0.0163905 | 559 | 4.198167 | 4.262556 | a |
| T2 | 33 | 4.279586 | 0.0163905 | 559 | 4.247392 | 4.311781 | a |
| T1 | 33 | 4.321250 | 0.0163905 | 559 | 4.289056 | 4.353445 | a |
| T3 | 33 | 4.326201 | 0.0163905 | 559 | 4.294007 | 4.358396 | a |
| T2 | 36 | 4.364414 | 0.0163905 | 559 | 4.332219 | 4.396608 | a |
| T3 | 36 | 4.390723 | 0.0163905 | 559 | 4.358528 | 4.422917 | a |
| T1 | 36 | 4.404045 | 0.0163905 | 559 | 4.371851 | 4.436240 | a |
| T3 | 40 | 4.407026 | 0.0163905 | 559 | 4.374832 | 4.439221 | a |
| T2 | 40 | 4.432813 | 0.0163905 | 559 | 4.400619 | 4.465007 | ab |
| T1 | 40 | 4.474017 | 0.0163905 | 559 | 4.441823 | 4.506212 | b |
within_means_tukey_Plot <- ggplot(dat, aes(y = weight, x = trt)) +
facet_wrap(~ day, labeller = label_both) + # facet per G level
geom_point(data = dat, aes(y = weight, x = trt, color = trt),
alpha = 0.5, position = position_jitter(width = 0.1)) + # dots representing the raw data
geom_point(data = within_means_tukey,aes(y = exp(weight), x = trt),
color = "red", position = position_nudge(x = 0.2)) + # red dots representing the adjusted means
geom_errorbar(data = within_means_tukey, aes(ymin = exp(lower.CL), ymax = exp(upper.CL), x = trt), color = "red", width = 0.1, position = position_nudge(x = 0.2)) + # red error bars representing the confidence limits of the adjusted means
geom_text(data = within_means_tukey, aes(y = exp(weight), x = trt, label = str_trim(.group)), color = "black", position = position_nudge(x = 0.3), hjust = 0) + # red letters
scale_y_continuous(name = "Weight (g)", limits = c(30, NA), expand = expansion(mult = c(0, 0.1))) +
scale_x_discrete(name = NULL) +
theme_classic() + # clearer plot format
labs(caption = str_wrap("Opaque dots represent raw data. Red dots and error bars represent estimated marginal means ± 95% confidence interval per group. Means not sharing any letter are significantly different by the t-test at the 5% level of significance.", width = 120)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), legend.position = "bottom")
within_means_tukey_Plot